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Abstract 

We study the use of local heuristics to determine spanning subgraphs 
for use in the dissemination of information in complex networks. We in- 
troduce two different heuristics and analyze their behavior in giving rise 
to spanning subgraphs that perform well in terms of allowing every node 
of the network to be reached, of requiring relatively few messages and 
small node bandwidth for information dissemination, and also of stretch- 
ing paths with respect to the underlying network only modestly. We 
contribute a detailed mathematical analysis of one of the heuristics and 
provide extensive simulation results on random graphs for both of them. 
These results indicate that, within certain limits, spanning subgraphs are 
indeed expected to emerge that perform well in respect to all require- 
ments. We also discuss the spanning subgraphs' inherent resilience to 
failures and adaptability to topological changes. 

Keywords: Complex networks, Local heuristics, Spanning subgraphs. 

1 Introduction 

Let G — (Ng,Eq) be an undirected graph with n = \Nq\ nodes and edges 
representing bidirectional links for pairwise communication among the nodes. 
We regard G as standing for some unstructured, real- world network whose nodes 
have no more information on the overall topology of G than can be inferred from 
their immediate neighborhoods. Given these characteristics, G can also be seen 
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as belonging to the class of networks that have recently come to be referred to 
as complex networks 

Several of the typical problems that require a distributed solution by the 
nodes of G frequently involve the need to disseminate a piece of information, 
call it /, through the nodes of the network that share the same connected 
component with the node that originally possesses /. We assume that this node 
is unique with respect to that particular information dissemination and refer to 
it as the originator. 

There is a host of possibilities to solve this basic problem of disseminating 
information through the nodes of G, but invariably they either depend on the 
existence of a spanning subgraph of G on whose edges the dissemination is per- 
formed, or else they employ straightforward flooding of the network's edges by 
copies of I. The former alternative is often regarded as substantially more cost- 
effective in terms of several quantities of interest, but of course it carries with it 
the inherent need for the desired spanning subgraph to be initially determined 
and subsequently maintained if the network undergoes topological changes 0]. 

Subgraphs of interest in this context include the well-known minimum span- 
ning trees, for which several procedures related to creation and maintenance 
are available [SUSIE]) an d include also the more general, so-called spanners, 
which bring with them well-defined structural requirements related to efficiency 
indicators, but are on the other hand considerably less well-known [131 117] . But 
regardless of the particular guise of the spanning subgraphs of G for use in infor- 
mation dissemination, the importance of studying them in detail has in recent 
years found strong justification from the practical side. Notable examples here 
include the case in which G is some virtual supergraph of a physical network; 
in this case, spanning subgraphs of G are needed to function as the so-called 
overlay networks for end-to-end communication over the underlying physical 
network |7]. 

In this paper we focus on one very basic question related to determining a 
spanning subgraph of a complex network G: how close can we get to obtaining 
a subgraph of G that can be used to disseminate I through the nodes of the 
originator's connected component, while at the same time satisfying some basic 
set of performance requirements, if nodes are only allowed to use local infor- 
mation (i.e., information that can be obtained from no farther than the nodes' 
immediate neighborhoods)? Important requirements involve the expected num- 
ber of nodes reached when / is disseminated, the expected number of copies 
of I that are needed, the expected degree of each node in the subgraph (since 
it relates closely to how many copies of / a node can concurrently send out 
given its bandwidth limitations), and also the expected path length from the 
originator on the subgraph. 

Even though of a fundamental nature, this question is admittedly too gen- 
eral for an objective analysis. For this reason, we concentrate on the narrower 
issue of investigating what happens in terms of the aforementioned performance 
requirements when a subgraph of G is determined in a fully distributed fash- 
ion by the nodes according to the following strictly local rules. Each node is 
responsible for choosing some of its own neighbors in the subgraph and makes 
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its choices in two subsequent steps: first one node is picked from among the 
node's neighbors; then, with some nonzero probability that we call a, the node 
gets to pick a second neighbor (which we allow to be identical to the first one 
it picked). 

If this simple procedure is performed by all the nodes of G, then clearly the 
resulting subgraph, which we denote by D — (No, Ed), has Nd = Nq while 
Ed contains all the edges (u, v) from G such that u chose v at least once or v 
chose u. This subgraph is then necessarily a spanning subgraph of G; using it for 
disseminating / from the originator is simply a question of having the originator 
send / to all of its neighbors in D, and similarly for all the other nodes when 
they receive / for the first time. Our four performance requirements now have 
to be examined in terms of how the connected components of D relate to those 
of G. In particular, does the connected component of D to which the originator 
belongs span the entire connected component of G that contains it? 

Throughout the paper we use c u to denote the number of choices made by 
node u and C u to denote the set of nodes (some of u's neighbors) that get chosen 
by u. Clearly, 1 < |G„ < c u < 2 and the set of u's neighbors in D is given by 
G„, possibly enlarged by every other node v that is a neighbor of u in G and 
such that u € C v . 

We call D a dissemination subgraph of G and devote the remainder of the 
paper to analyzing its properties. Our analysis depends, naturally, on the spe- 
cific criteria that each node uses when making its two decisions. We consider 
two possibilities, of which the simplest, referred to as the uniform approach, 
lets each node make its choices uniformly at random among its neighbors. Our 
analytical treatment of Z?'s properties in this case is given in Section it is 
based on regarding G as a random graph and makes use of the principles laid 
down in |16|. This is our core section, and its results are complemented by the 
simulation results we present in Section for Poisson-distributed node degrees 
(the classic Erdos-Renyi model and also for degrees distributed according to 
a power law (recently discovered to be approximately representative of relevant 
real- world networks [IT3 El 12] ) • 

Even though the analytical treatment we offer in Section |21 is specific to this 
simplest possibility for a choice criterion, at this point it seems to be as far 
as we have the means to go. For this reason, and notwithstanding the second 
possibility's clear superiority in terms of our stated performance requirements 
(see below), we only treat that possibility by means of simulations, whose results 
arc described in Section along with those for the uniform approach. In any 
event, our mathematical analysis for the uniform approach is innovative and we 
believe it may yield interesting insight into the analysis of similar problems on 
complex networks. It may also ultimately be possible to generalize it to handle 
the more complicated case of the second possibility of choice criteria. 

We refer to the second possibility as the degree-based approach. In this 
approach, the first choice by a node selects uniformly at random from those of its 
neighbors that have the highest degree (if this is the case for only one neighbor, 
then the first choice degenerates into a deterministic decision). The second 
choice selects a neighbor randomly in proportion to its degree. The degree- 
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based approach is clearly much less uninformed on the network's topology than 
the uniform approach. For this reason, it is expected to surpass the uniform 
approach in terms of the indicators we have informally introduced. That this 
is indeed the case is apparent from the simulation results we show in Section [3J 
But, even if we find this to be only expected, we also find it remarkable that 
such a simple strategy of strictly local nature should support a positive answer 
to our original question to the extent that it does. 

We complement our study of our local, two-choice scheme to approximate a 
spanning subgraph of G by elaborating on its resilience and adaptability proper- 
ties. These are crucial in the context of networks that may undergo topological 
changes and we treat them in Section^] To finalize, we offer concluding remarks 
in Section [5j 



2 Mathematical analysis 

Henceforth we regard G as a random graph having node degrees distributed 
independently from one another and identically to a random variable Kg- Fur- 
thermore, the nodes of G are assumed to be connected to one another at random 
given their degrees, so the degrees of two adjacent nodes remain independent. 
Our results in this section target the case of a formally infinite set of nodes, 
that is, the case in which n — > oo. 

Let -Pg(i) be the probability that a randomly chosen node of G has degree 
a. The average degree of G, denoted by Zq, is Zq = J2a=o a Pc{o). Also, given 
the random nature of node interconnections in G, the probability that some 
node's neighbor has degree b is equal to the expected fraction of edges incident 
to degree-6 nodes, which is given by 

bP G (b) _ bP G (b) 



From we know how to characterize the existence in G of a large, 

size-O(n) connected component, commonly known as the giant connected com- 
ponent of G (henceforth denoted by GCCg)- If (Kq) denotes the second mo- 
ment of the random variable Kq, that is, (Kq) = 5Z"= a 2 Pc(a), then GCCg 
almost surely exists if and only if 

^ > 2 (2) 



Z G 
aP G (a) 



or, equivalently, 

-l 

x ~ ' lM "-a > 2. (3) 



a=0 U 



Intuitively, for a randomly chosen node u and letting v be one of its neighbors, 
this means that GCCg almost surely exists (and then G is said to be above 
the phase transition that gives rise to GCCg) if and only if v is expected to 
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have strictly more than one neighbor besides u. Otherwise, all the connected 
components of G are small, consisting of o(n) nodes (and G is said to be below 
the phase transition). 

Now let D be a dissemination subgraph of G constructed by the uniform 
approach; it is also a random graph, and we let GCCc denote its giant connected 
component. Given a randomly chosen, degree-a node u of G, let 7r^(l) denote 
the probability that \C U \ = 1. If a > 0, then 

7r( a) (l) = l-a + a/a, (4) 

which reflects the probability that either c u — 1 or c u — 2 but both of m's choices 
were identical. It follows that the probability that \C U \ — 2 is 

7r r <°>(2) = 1 - ^ r (a) (l) = a(o - I)/ a. (5) 

For a = 0, clearly t^ q) (1) = 7r r (a) (2) = 0. 

We now pause momentarily to note that, throughout the paper, we employ 
the mnemonic "r" when referring to randomly chosen nodes as in the preceding 
paragraph. Furthermore, when considering a node v reached by following one of 
its incident edges, say (u, v) for some neighbor u of v, we utilize the mnemonics 
"c," "nc, 1," and "nc, 2" as references, respectively, to the cases of v e C u , 
v £ C u with c v = 1, and v £ C u with c„ = 2. These mnemonics are intended to 
facilitate the use of the probabilities calculated above (and also the ones we are 
about to calculate) later in this section. 

Let u be a randomly chosen node and v one of its neighbors in G such that 
v G C u . Nodes u and v are then neighbors in D. Let b be the degree of v in 
G. The probability that \C V \ {u}\ — (i.e., v chose u and no other neighbor 
besides u), which we denote by ^''•'(O), is the probability that all of v's choices 
resulted in u, that is, 

cm^n ^ ,1 1 1 — a + alb 
7rW(0) = (l - a )- +a - = ^-J-. (6) 

Similarly, the probability that \C V \ {u} \ = 1, which we denote by 7Tc^(l), is 

7rW(l) = (l-a) 

which means that either c v = 1 and u £ C v , or c v = 2 and either v chose u 
exactly once or it did not but its choices were identical. Finally, the probability 
that \C V \ {u}\ = 2, which we denote by 7^(2), is the probability that c v = 2 
while v's choices were distinct both from each other and from u, that is, 

*^>=«(V)(¥)- ^ 

(It is worth noting that 7Tc^(0), 7Tc^(1), and 7Tc^(2) remain as calculated even 
without the condition that v G C u , and in this case the use of "c" is pointless. 
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(l-a + 3a/b), (7) 
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We do insist on v € C u , however, because this is the context in which the three 
probabilities are used in the sequel.) 

In an analogous way, let us consider a randomly chosen node u and a degree- 
b neighbor v of u in G such that v $ C u . If this is the case, then u and v are 
neighbors in D if and only if u € C v . If c v = 1, then u G C v if and only if 
\C V \ {u}\ = 0, and this happens with a probability that we denote by 71^^(0) 
and is such that 

t£i(0) = 1/&- (9) 

So 1 — Tnc l (^) ^ s ^ ne probability that it and v are not neighbors in D, given that 
v $ C u and c v = 1. If c„ = 2, then the probability that u € C„ is (26 — l)/& 2 , 
as we see from the fact that either \C V \ {u}\ — or \C V \ {u}\ = 1 may happen. 
The probability of the former, denoted by 7T^? 2 (0)j ^ s given by 

4c! 2 (0) = ^, (10) 

while the probability of the latter, denoted by tt^ 2 (1), refers to exactly one of 
u's choices resulting in u. Thence 

^ 2 (1) = (ID 



b 2 

'nc,2( 



So, given that v $ C u and c v = 2, 1 — 7r^ 2 (0) — ^^(l) ^ s * ne probability that 
u and v are not neighbors in D. 

In the remainder of this section we analyze the efficacy of the uniform ap- 
proach, concerning the performance requirements mentioned in Section ^ when 
D is used to disseminate / from the originator. Recall that we assume the 
limiting case of n — > oo, so the probability that a finite-length cycle exists is 
negligible. We also assume that both G and D are above their phase transitions 
(that is, both GCCg and GCCd almost surely exist) and predicate our analysis 
upon the originator being a member of GCCg ■ 



2.1 Number of nodes reached 

Let P n be the ratio of the expected number of nodes reached when / is dissem- 
inated on D's edges to the expected number of nodes in GCCg. Let also 9q 
and 9 D be the fractions of n corresponding to nodes inside GCCg an d GCCd, 
respectively. There are two cases to be considered. The first case is the one 
in which the originator is a member of GCCd, which occurs with probability 
9d/&g, this being also the ratio in this case. The second case corresponds to 
the originator being outside GCC_d and then the ratio is negligible. Therefore, 
P n is given by 

= (12) 

G 

Given a node u and one of its neighbors, say v, we define the reach of u 
through v in G as the set of nodes reachable by a path in G starting at u whose 
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first edge is (u, v). We call v a dead end with respect to u in G if the reach 
of u through v in G is o(n). Clearly, v is a dead end with respect to u in G, 
and this happens with probability denoted by q, if and only if each of the other 
neighbors of v, in turn, is itself a dead end with respect to v in G, and this 
happens with probability q for each of those neighbors as well. If the degree of 
v is 6, then it is a dead end with respect to u in G with probability q b_1 . And 
since the probability that v has degree b is given as in <£Q), this leads to 



(13) 



6=1 



Similarly, a randomly chosen node is not in GCCg if and only if each of its 
neighbors is a dead end with respect to it in G. The expected fraction of nodes 
inside GCCg is then 



n-l 



9 G = i-E P G(a)<Z a - 



(14) 



a=0 



The value of Od can be obtained in a similar, albeit more complex, way; it 
relies on definitions of reach and of dead-end nodes in D that are completely 
analogous to the ones in G. Let a be a randomly chosen node and v one of its 
neighbors in G. Let also q c , q nc ,i, and q nc ,2 be the conditional probabilities that 
v is a dead end with respect to u in D given, respectively, that v £ C Ul that 
v $ C u with c v = 1, and that v ^ C u with c v — 2. Regardless of which of these 
three conditions the case is, v is a dead end with respect to u in D if and only 
if cither v is not a neighbor of u in D (which cannot happen under the first 
condition) , or it is but the reach of u through v in D is o(n) , which means that 
all the other neighbors of v in G are themselves dead ends with respect to it in 
D. We may then write 



E 

6=1 



bPgjb) 
Z G 



ni b \0)((l-a)q ncA +aq nca ) b - 1 

+ n^(l)q c ((1 - a) q nCt i + ag nCj2 ) 
+ ir^(2)q 2 c ((1 - a) q ncA + aq uc , % ) 



fc-3 



(15) 



9nc,l — 



n-l 

E 

b=l 



bP G {b) 



and 



l-4c.i(0) + <c'.i(0)((l-a)g nc ,i+a (Znc , 2 ) 1 , (16) 



lb) 



6-1 



9nc,2 



E 

6=1 



bP G {b) 



1-^2 2 (0)-tC 2 (1) 



(t>) 



+ ^ncA ) (I 1 _ a ) <7nc,l + aqnc,2) 

+ ^ncA 1 )^ ((1 - a)q nc .! + aq nC:2 ) b ~ 2 , (17) 



6-1 
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referring back to the probabilities calculated in the introduction to Section [3 

Each of the expressions in H15fl -I|17 [) illustrates our use of those probabilities. 
We comment on (I15fl in detail and urge the reader to consider each of the others, 
as well as other expressions yet to come, in a similar light. In 1)15(1 . and for 
k e {0, 1,2}, q* ((1 — a)(?nc,i + aq nCt2 ) b ^ 1 ~ k gives the conditional probability 
that all the 6—1 neighbors of a degree-^ neighbor v of u that are not u are 
dead ends with respect to v in D; the condition is that \C V \ {u}\ = k, which 
happens with probability 71-^ (fc). Thus is the dead-end probability for the 
k neighbors in C v \ {u}, and ((1 — a)q nc ^i + aq nc ^) is the dead-end probability 
for each of the b — 1 — k neighbors that are not in C v U {u}. 

Now, since a randomly chosen node is not in GCC^ if and only if each of 
its neighbors in G is a dead end with respect to it in D, we have 



-1 

6 D = 1 -P G (0) - V] P G (a) 4°\l)q c ((1 - a) q ncA + aq^f 



a=l 



+ 7r<°>(2)<£ ((1 - a) q ncA + aq nc , 2 ) a 2 J . (18) 

Notice that, in l|18fl . -Pg(O) must be singled out of the sum in order to be taken 
into account, since 7Tr°^(l) = 71-^(2) = 0. 

2.2 Number of messages sent 

We denote by P m the ratio of the expected number of messages sent when 
disseminating I through D's edges to the expected number of messages sent 
when disseminating / through the edges of a spanning tree of GCCq. Let 
Zgcc d be the average degree of nodes in D conditioned upon membership in 
GCC/j. Restricting our analysis to the case in which the originator is in GCCd 
(the other case leads to a negligible ratio), which occurs with probability 9d/6g, 
the expected number of messages sent on the edges of D is uOdZgcCo j while the 
expected number of messages sent on the edges of the spanning tree of GCCg 
is 2 (n9c — 1), thus leading to 

_ 2 D nZ GCCD 
^-26 G (n9 G -iy {W) 

Now let Po{i I GCCd) be the probability that a randomly chosen node of 
GCC/j has degree i in D. We have 

n-1 

Zgcc d =Y, iPd ( 1 I GCC D ). (20) 

If also we let Pyj(i | a, GCCyj) be the conditional probability that a randomly 
chosen node of GCC/j has degree i in D given that it has degree a in G, then 
Pd(i I GCCyj) can be written as 

n-1 

P D (i I GCC/j) = J2 P d(* I a, GCC D )P G {a \ GCC D ), (21) 
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where Pg(o, | GCC/j) is the probability that the node has degree a in G. We 
henceforth approximate Po{i | a, GCC/j) by Po(i | a), which is the probability 
that a randomly chosen, degree-a node of G has degree i in D (more on this 
approximation in Section Also, if u is this node and v one of its neighbors 
in G such that v $ C u , then the probability that u £ C v , denoted by r, is given 

by 



r = 



y bP G (b) 



"(1 - a) 7^(0) + a (ir%(0) + tt£ 2 (1))] (22) 

6=1 

and yields 

P D (i | a) = ttM (1) ( • I J) ^ (1 - + ^ ( 2 ) ( • 1 2 ) ^ C 1 - ^ ■ 

(23) 

Using Bayes' rule to rewrite Pg(ci | GCC/j) as 

p ( lrrr v PD(GCC g | a)P G (q) 

P G (a | GCC B ) = Pd(GCCd) , (24) 

where P^GCC^) = 6 D and 

Pz,(GCC D | a) = 1 - 7r( a )(l)o c ((1 - a) g nC)1 + aq^f' 1 

- ^ a \2)q 2 c ((1 - a) q nc<1 + aq nc . 2 ) a - 2 , (25) 

leads, finally, to 

n— 1 n— 1 



7 VTpfl , Pp(GCC g | a )P G (a) 

i__0 a—i 

^ Pp(GCC D \ a)P G (a) ^ 
= 2^ a 2^ iPd ^ I a )' (26) 

a=0 D i=0 



where 

a 

^ iP D (i | a ) = 7^(1) (1 + (a - l)r) + (2) (2 + (a - 2)r) . (27) 



i=0 



2.3 Node degree 

Let us now consider the expected degree of D conditioned upon membership in 
GCCg, which we denote by Zd,gcc g - Let Po(i | GCCg) be the probability 
that a randomly chosen node of GCCg has degree i in D. Then Zd,gcc g is 
clearly given by 

n-i 

Zd,gcc g = I] »ib(t I GCCg), (28) 

i=0 
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where 

n-1 

P D (i | GCC G ) = Y. Pd{ 1 I « ; GCC G )P G (a | GCC G ), (29) 

a— i 

Po(i | a, GCC G ) being the probability that a randomly chosen, degree-a node 
of GCC G has degree i in D. This lets l|2"5|) be rewritten as 

n— 1 n— 1 

= ^^P (i|a,GCC G )P G (a|GCC G ) 

i—0 a—i 

n—X a 

= ^P G (a|GCC G )^iPD(i|a,GCC G ), (30) 

a=0 i=0 

which, using Bayes' rule to write 

p , rrr , P G (GCC G | a)P G (a) (l- g a )P G (a) 

P G (a | GCC G ) = Pg(GCCg) = Tq (31) 



(cf. (O), yields 



n-1 



Z D ,GGGa = q ] )PG{a) Y,^ I «,GCC G ). (32) 

' G 



One possibility now would be to proceed similarly to what we did in Sec- 
tion [^2] and approximate Pd(« | a,GCC G ) by Po(i | a). This would imme- 
diately let us use (|27|l in (|32|l and be done. However, we know from early 
experiments like the ones to be discussed in Section |3| that, unlike the case of 
Section 12.21 this sometimes yields an agreement between analytical prediction 
and simulation that is not satisfactory. In the present case, then, we look more 
closely at the nature of Po(i | a, GCC G ) and first notice that l12.'{[) — and con- 
sequently l(77|) as well — can essentially be used to express Po(i | a,GCC G ), 
provided the probability r appearing in it is made to depend on a and on the 
degree-a node's membership in GCC G . This is to be taken in opposition to 
the expression for r in (|22|l . but clearly all that needs to be changed in l|22|l 
to make the dependencies manifest is to replace 6P G (6)/Z G as the probability 
that a given neighbor of a degree-a node has degree b. The reason why this is 
so is that, given a degree-a node's membership in GCC G , that probability is no 
longer independent from a (as we know from our comment following J3J), the 
existence of GCC G is related to nodes' neighbors' degrees). 

Let p be the probability that we seek for use in place of 6P G (6)/Z G . That is, 
p is the probability that a given neighbor of a degree-a node of GCC G has degree 
b. The probability that a degree-a node is in GCC G is 1 — q a (all its a neighbors 
must otherwise be dead ends with respect to it in G), so the probability that 
a degree-a node is a member of GCC G and moreover a given neighbor of it 
has degree b is p(l — q a ). But this latter probability can also be expressed as 
(bPc{b) / Zq) (1 — g a+b ~ 2 ), since the rightmost factor is the probability of the 
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condition that a pair of neighbors (one of degree a, the other of degree b) is 
in GCCg, and furthermore bPc(b) / Zq, under that condition, continues to give 
the probability that a given neighbor of a degree-a node has degree b. We may 
then write 

P(l-<f) = ^(l-<f +b - 2 ), (33) 
from which it follows that 

We note, finally, that an analogous but more complicated development could 
also be used to avoid the approximation of Pn(i | a, GCCd) by Po{i \ a) made 
in Section 1^1 As explained, however, that would only add needless detail. 



2.4 Path length 

Let Pt be the ratio of the expected path length from the originator in D to 
the expected path length from the originator in G. We may again restrict our 
analysis to the case in which the originator is a member of GCCd, which occurs 
with probability 9d/9g- Denoting by £-gcc g an d Lqcc d the average path 
lengths from the originator in G and in D, respectively, we have 

P = (35) 

VG^GCCo 

By <|31|) . the average degree in G of the nodes inside GCCg, denoted by 
Zgcc g , is 

a=0 G 

Now let two nodes be called ^-neighbors in a graph, for I > 1, when the distance 
between them in the graph is I (the case of £ = 1 is simply the case of neighbors in 
the graph). Given a randomly chosen node u in GCCg and one of its neighbors, 
say v, the expected number of v's other neighbors (i.e., excluding it), denoted 
by p, is 1 



^ bP G {b) 



^ Z G 

fc=i ^ 



(6-1), (37) 



and then the expected number of it's 2-neighbors in G, which we denote by 



7 {2) is 
^GCCg' lb 



4cc. = ^gcc g P- (38) 



The use of bPQ(b) / Zq in 1371 and later in 1421 - ErTl is in principle subject to the same 
corrections explained at the end of Section l2.ijl However, introducing those corrections in the 
present context not only seems unnecessary given the computational results to be discussed 
in Section|3 but also would lead to much more complicated (and probably insoluble) versions 
of GOJ and |SHJ. 
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In general, the expected number of u's ^-neighbors in G, denoted by 4cc G , is 



4cc G = 4cc G P = ZocCaP 1 - 1 . (39) 

We can then obtain an approximation for Lgcc g by summing the values of 
Z GCCg from I = 1 up until the sum becomes equal to the expected number of 
nodes inside GCCg minus one (to account for node it). Thus 



which yields 



In 

£gCC g = — 



E 4'cc G =^G-l, (40) 



(41) 

m p 

Let us now turn to obtaining the value of Lqqq d . Consider a randomly 
chosen node u of GCC^i and a neighbor v of u in G. If v G u , then let r nCi i 
be the conditional probability that w s C„ given that c v = 1. We have 

rnc^E^^W- (^) 

Likewise, if r nCj2 is the conditional probability that u £ C v given that c„ = 2, 
then 

r nc , 2 = E^(-S 2 (0) + 7 r2 2 (l))- (43) 

Calculating the expected number of other neighbors of a neighbor v of u 
requires three cases to be considered. The first case is the case of v £ C u , and 
then the expected number of it's 2-neighbors in D that are reachable from v is 
^(l^nci^nc^), where 

t c (x, y, z) = J2 b -^- {n^ (0) (b - 1) ((1 - a) y + az) 

6=1 G 

+ tt< 6 > (1) [x + {b - 2) ((1 - a) y + az)} 

+ ^ b) (2) [2a; + (6 - 3) ((1 - a) y + <*«)]}. (44) 

In the second case, v ^ C u with c„ = 1. We similarly let 

" *b 

Z G 



t ncA (y, z) = J2 kc,i(0) (b 1) ((1 -a)y + az)] , (45) 



b=l 



and then the expected number of u's 2-neighbors in D that are reachable from 

V is tncl^ncl^nc^)- 
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The third and final case is that of v $ C u with c v — 2. As in the previous 
two cases, we let 

inc, 2 (x, y, z) = J2 {^ 2 (0) (b 1) ((1 — a)y + az) 

6=1 G 

+ tt£ 2 (1) [x + (b - 2) ((1 - a) y + az)}} , (46) 
which yields the expected number of u's 2-neighbors in D that are reachable 

from V as t„ c ,2(l, »nc,l, r nc,2)- 

Now, for u a randomly chosen node in GCCd, and recalling l|24|) . letting 
t^z) = t PD{GCC ^ a)PG[a) 

a—Q 

{4 a) (!) [a; + (o - 1) ((1 - a) p + az)} 

+ ir^ (2) [2x + (a - 2) ((1 - a) y + az)} } (47) 

allows the expected degree of u to be expressed as t v (l, r n c,i) ?"nc,2), and similarly 
the expected number of 2-neighbors of u as 

t r (t c (l, 

?"nc, 1 ? ^nc ,2)3 ^nc, 1 (^*nc,l ; ^*nc, 2)) inc, 2 (l) ))• (48) 

For simplicity's sake, let fil, fi™' 1 , $? c < 2 , $£i> /3^,i, /3£c,2> C,2. /?> 

/3" c '\ and /3f c ' 2 be such that igSJl, JMJ), and (g2} can, respectively, be 

rewritten as 

t c (x, y, z) = fi c c x + fi^y + fi™' 2 z, (49) 



tno,i(y, z) = (3™Ay + PH^z, 



(50) 



t nc ,2{x, y, z) = PI 2 x + fi™iy + fi^tz, 



and 



t r {x,y 1 z)=fi c v x + fi™> 1 y + fi l ; c > 2 z. 
Then, introducing the row vector A — [ /3£ fi^' 1 fif c ' 2 



B 



fit 




nc,l 



nc,2 



,onc,l one, 2 

Pnc.l Pnc,l 

ac 01x0,1 onc,2 

Mnc,2 Mnc,2 Mnc,2 



(51) 



(52) 



and the matrix 



we have that the expected number of ^-neighbors of a randomly chosen node in 
GCCd, which for I > 1 we denote by Z t 



GCC D ' 



IS 



Z GCC D = ^ 1 



1 

?"nc,l 

r nc ,2 



(53) 
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(The reader should check that (|53l) yields the Zqcc d °f tSU f° r ^ = 1; an d also 
that it becomes (gHl for £ = 2.) 

We are now left with the task of finally obtaining Lgcc d ■ This can only be 
achieved numerically, and to this end we resort to the eigenvalues (say Ai, A2, 
and A3, which we assume are all distinct 2 ) and corresponding eigenvectors (i>i, 
v 2 , and V3) of B. If V is the matrix whose columns are v±, v 2 , and V3, and A the 
matrix having diagonal elements Ai, A2, and A3 with O's everywhere else, then 
B can be diagonalized into A via 

A = V^BV, (54) 

which can be equivalently expressed as 

B = VAV- 1 (55) 

and used to obtain the B^ 1 of Q as B^ 1 = VK^V' 1 P]. 

Proceeding in a manner analogous to the one that led to l|40(l . we can then 
obtain Lqcc d by numerically solving the equation 

iGCC D 

E 4cc D =^-l> (56) 



which, by (|53|) . is equivalent to 





r A ^ GCC °-i 

Ai-l 










1 




AV 





\ L 2 GCC °-1 

A2-1 





v- 1 


?"nc,l 


= n,6 D - 1 










A3 GCC "-1 

A3-1 




r„c,2 





3 Simulation results 

In addition to our mathematical analysis of Section [21 and seeking to validate 
it experimentally, we have carried out simulations of the uniform approach to 
the construction of D. Also, and notwithstanding the fact that our analysis 
has not included the degree-based approach, we have extended our simulations 
to cover it as well. Each of our simulations is based on disseminating / on 
random graphs, which are always generated to be above the corresponding phase 
transition. This allows the simulation to be constrained to operate within the 
graph's largest connected component, which almost surely is a giant connected 
component. 

We have considered two random-graph models. The first model corresponds 
to the classical model of Erdos and RA©nyi [S] , in which G is constructed on 
n nodes by letting each of the possible n(n — l)/2 edges exist with constant 

2 This has proven true in all the experimental scenarios of Section so we dwell on the 
matter no further. 
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probability z/(n — 1) for < z < n — 1. As a result, G has node degrees 
distributed according to a Poisson distribution, that is, the probability that 
a node has degree a is Pg(o) = e~ z z a /a\ [S]- For Poisson-distributed node 
degrees, it follows from that the graph is above the phase transition if and 
only if z > 1, since Zq = z and (Kq) = z 2 + z. 

We have concentrated on analyzing the behavior of P n , P m , Zd,gcCgj anc ^ 
P t for 1 < z < 10 and a = 0.10, 0.25, 0.50, 0.75, 1.00. To this end, and for each 
value of 2, we generated 300 random graphs with n — 10000 nodes, and then 
constructed two instances of D for each value of a, one following the uniform 
approach and the other the degree-based approach. On each of the instances, we 
then conducted 1000 disseminations by randomly choosing an originator from 
among the nodes in the largest connected component of G. At the end, we 
averaged the quantities of interest overall to obtain P n , P m , Zd,gcc G j an d Pt- 
(Note that obtaining Zd,gcc g does not depend on any dissemination, but rather 
only on the available G and D instances. The same is in principle also true of 
Pt, but we simulate disseminations by breadth-first search from the originator 
and P t can then be obtained along the way.) 

Figure^shows simulation results for random graphs having Poisson-distributed 
node degrees. Parts (a-d), concerning the uniform approach, show an excellent 
agreement between analytical and simulation results, with only a slight devia- 
tion in part (d), which is in all likelihood to be attributed to the approximations 
made in Section l2~4l The plots for P n (Figure ^a, e)) evidence the expected 
superiority of the degree-based approach over the uniform approach, since in 
the former case P n approaches 1 rapidly as z is increased, more or less regard- 
less of a (in the uniform approach, this only seems to happen for z < 10 when 
a > 0.50). A closer examination of the data for the degree-based approach, say 
for z = 5 and a = 0.50, reveals P n w 0.998, P m w 1.24, Z D}GC c G w 2.48, and 
Pt ~ 1.76. What this means is that, using roughly 1.24 times as many edges as 
a spanning tree and paths that, on average, are greater than those of G by a 
factor of only 1.76, the dissemination subgraph reaches almost all the nodes of 
the network while having a relatively low average node degree. Comparing the 
two approaches, it is curious to note that the plots of P m (Figure ^b, f)) are 
very similar to each other, the same holding for those of Zd,gcc g (Figure ^c, 
g)), which indicates that the number of edges in the dissemination subgraph 
is quite independent of whether one approach is used or the other. However, 
the difference between the P n plots demonstrates that the choices made by the 
nodes in the degree-based approach somehow lead the edges to end up deployed 
in such a manner as to favor the connectedness of the dissemination subgraph 
strongly. 

The other random-graph model we have considered is the one in which node 
degrees are distributed according to a power law. The probability that a node 
in G has degree a is in this case, and for n — > oo, given by Pa{ a ) = a ~ T IC{ T )i 
where r > 1 is a parameter and C( x ) is the Riemann zeta function |18| . that is, 
C0»0 = ET=iy- x - Then we have Z G = C(r-1)/C(r) and (K 2 G ) = C(r-2)/C(r), 
so solving numerically yields r < 3.47 as the condition for G to be above 
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Figure 1: Simulation results for the uniform (a-d) and the degree-based 
(e-h) approach on random graphs having Poisson-distributcd node degrees. 
The plots show P n (a, e), P m (b, f), Zd,gcc g ( c > S), and p t ( d , h) for 
a = 0.10, 0.25, 0.50, 0.75, 1.00. Solid lines give the analytical predictions of Sec- 
tion El 
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the phase transition. We have performed simulations for 2 < r < 3 in the same 
way as we did for the Poisson case. 

Random graphs with degrees thus distributed can be generated in two phases. 
First the degrees ai,a 2 , ■ ■ ■ ,a n of the n nodes, constituting the graph's so-called 
degree sequence, are sampled repeatedly from the power law until 53i=i a i comes 
out even. Then Yli=i a i labeled balls are put inside an imaginary urn, where 
exactly a% of the balls are labeled i, for 1 < i < n. A pair of balls, say of labels 
u and v, is then withdrawn from the urn and the edge (u, v) is added to the 
graph; this process is repeated until the urn becomes empty. This algorithm 
clearly generates a multigraph, where self-loops and multiple edges are allowed 
to exist. What we do as a last step is to discard such undesirable edges, which 
at the end yields a random graph whose degree sequence is an approximation 
of the one sampled. 

Figure El shows simulation results for random graphs having node degrees 
distributed according to a power law. The plots for the uniform approach (Fig- 
ure EI a-d)) show poor results, as P n stays clear of 1 for most values of t, but 
they do nonetheless corroborate the analytical predictions of Section [21 in parts 
(a), (b), and (c). For part (d) no analytical result is given, since equations <{3"§jl 
and l|53|) . as similarly observed in ^1, do not converge. The plots for the degree- 
based approach (Figure Efe-h)), in turn, show excellent results for r < 2.4. In 
this range, P n « 1 and both P m and P t are slightly above 1, regardless of the 
value of a, thus demonstrating that the dissemination subgraph is very close to 
a spanning tree. As for Zd,GCC G i ^ stays modestly valued below roughly 2.25 
throughout the entire spectrum of r values. 

4 Resilience and adaptability 

4.1 Resilience to node and link failures 

Let 71 and 72 be the probabilities, respectively, that a given node and link are 
operational. Letting 7 = 7172 be the probability that a given transmission is 
successful, we now consider the problem of using a dissemination subgraph to 
disseminate information when each transmission has a failure probability of 1— 7. 
(Note, before we begin, that a simple protocol employing acknowledgement 
messages to ensure reliable transmissions can be used when 7 is substantially 
low. In spite of this fact, our interest is to verify what happens to the value of 
P n when a failure may occur and no additional message is sent to make up for 
it-) 

Let us consider what happens to GCCg when failures may occur. For such, 
let G be the graph obtained from G by independently removing every edge with 
probability 1 — 7. Employing the same nomenclature as in Section |2 a node 
of G' is outside GCCc if and only if each of its neighbors in G is a dead end 
with respect to it in G". Considering a randomly chosen node it, let q' be the 
probability that a given neighbor v of u in G is a dead end with respect to it in 
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= 0.75 
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Figure 2: Simulation results for the uniform (a-d) and the degree-based (e- 
h) approach on random graphs having node degrees distributed as a power 
law. The plots show P n (a, e), P m (b, f), Z DyGC c G ( c > §)> and p t ( d , h) 
for a = 0.10,0.25,0.50,0.75,1.00. Solid lines give the analytical predictions of 
Section [3 
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Figure 3: Simulation results of the degree-based approach with 7 = 0.95 on 
random graphs having Poisson (a) and power-law (b) node-degree distributions 
for a = 0.50,0.75, 1.00. Solid lines give the analytical predictions for 8q,/8q. 



G . We have 

, _ bP G (b) j 
' Z G 



q , = ^^j {1 _ 1 + W)b -^ (58) 



6=1 

where (1 — 7 + 7<?') b 1 indicates, when b is the degree of v, that each of the 
neighbors of v in G that are not u either is not a neighbor of v in G' or is itself 
a dead end with respect to v in G' . So, if Qq' is the expected size of GCCg', we 
obtain 

n— 1 

^' = l-E P G(a)(<zT- (59) 

a=0 

We have carried out simulations on random graphs having n = 10000 and 
degrees distributed according to either a Poisson distribution with 1 < z < 10 or 
a power law with 2 < r < 3. Our aim has been to analyze P n in the degree-based 
approach when 7 = 0.95, i.e., when each transmission has a 0.95 probability of 
success. These simulations have followed the same methodology as in Section|3| 

Notice that an upper bound on P n when 7 > can be obtained by considering 
a dissemination on all the edges of G' . Such a bound is thus 0q,/6q. The degree- 
based approach to the construction of D will then be as resilient to failures as 
P n is close to #g'/^g- 

Figure |3 shows the results for a = 0.50, 0.75, 1.00 and provides an indication 
of how resilient the dissemination subgraph is to transmission failures. Clearly, 
in both the Poisson case (part (a) of the figure) and the power-law case (part 
(b)), P n approaches 9q,/8q as G gets denser (i.e., higher z or lower r, as the 
case may be). 

4.2 Adaptability to topology changes 

We now take a brief look at how a dissemination subgraph D can be made 
to cope with dynamic topology changes in G. As customary in such cases, 
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we model the addition or removal of a node as, respectively, the addition or 
removal of the edges that are incident to it. It then suffices that we consider the 
addition or removal of single edges, in which context we further assume that the 
two end nodes of the edge in question are capable of detecting its appearance 
or disappearance instantaneously. 

The crux of this adaptability issue is that D, being constructed by strictly 
local actions by the nodes, can undergo changes that affect only a small vicinity 
of the edge that is being added or removed (this is to be contrasted with other 
situations — cf., e.g., PP — in which the impact of topological changes spreads 
much more widely). Let (u,v) be an edge that is added to or removed from G. 
In the uniform approach, only u and v need remake their choices; in the degree- 
based approach, this holds for u and v, and also for their neighbors (whose 
choices are affected by the degree of u or v, as the case may be). 

5 Conclusions 

In this paper we have considered the use of a spanning subgraph for dissemi- 
nating a piece of information, originally known to a single node, to all the other 
nodes of an unstructured network. We have introduced two local heuristics, 
referred to as the uniform and the degree-based approach, for building what 
we call a dissemination subgraph. As we argued toward the end of the paper, 
the heuristics' intrinsically local nature leads to a degree of resilience of the 
dissemination subgraph to failures, and also to a relative ease of adaptation to 
topological changes. 

We have contributed an innovative mathematical analysis of the uniform 
approach, one that we hope can be extended to the degree-based approach 
as well, and also inspire the mathematical analysis of similar problems. Our 
simulations on random graphs corroborate our analytical results for the uniform 
approach and demonstrate the efficacy, in terms of some relevant indicators, of 
the degree-based approach for networks in which node degrees are distributed 
according to a Poisson distribution or to a power law. 

We find it remarkable that independent, strictly local decisions by the nodes 
of a complex network are capable of giving rise to a global structure that in 
many cases comes very near a subgraph with, on average, important properties 
related to its use as a substrate for information dissemination. These proper- 
ties include the ability to reach nearly every node in the originator's connected 
component in the network, and do so with relatively modest requirements con- 
cerning the overall number of messages and per-node transmission bandwidth. 
They also include stretching paths only by a small factor when compared to the 
corresponding paths in the network. 
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